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Automatic car plate localization and recognition system is a system that 
identifies the car plate location and recognizes the characters on the car plate 
input images. Within the automated system, the car plate localization stage 
is the first stage and is the most crucial stage as the success rate of the whole 
system depends heavily on it. In this paper, a Malaysian car plate localization 
system using Region-based Convolutional Neural Network (R-CNN) 
is proposed. Using transfer learning on the AlexNet CNN, the localization 
was greatly improved achieving best precision and recall rate of 95.19% 
and 97.84% respectively. Besides, the proposed R-CNN was able to localize 
car plates in complex scenarios such as under occlusion. 
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1. INTRODUCTION 

Automated car plate localization and recognition system is one component of an intelligent 
transportation system. It uses image-processing techniques to automatically identify car plates from given 
input images without any human intervention. This system has a very wide application area which includes 
traffic monitoring systems, car park access systems, traffic law enforcement systems, automatic toll payment 
systems, and border crossing control system [1-3]. In order to achieve a good car plate recognition result, 
the system must first be able to accurately localize the location of the car plate from the given input image. 
In general, there are two major branches of algorithms being applied to perform the automatic car plate 
localization task, namely algorithms based on handcrafted features and algorithms based on deep learning. 
Example of handcrafted features includes the location of straight lines, edge density, connectivity 
information, and color information. Besides, in Malaysia, there are multiple car plate formats available [4,5]. 
Consequently, methods based on handcrafted features will have difficulty in handling the issue 
of non-standardization of Malaysian car plate formats. 

In [6], the authors have proposed to localize car plate from an input image by applying Hough 
Transform in order to detect the location of straight lines in the image and retrieve the bounding box location 
based on the location of the straight lines. However, Hough Transform requires large memory and has high 
computation time. As for [7-9], the Sobel operator has been used to localize car plates based on vertical edge 
detection. The limitations of vertical edge detection are that it requires prior knowledge about the edge 
information in the image and it is sensitive to noise. Authors in [10] have proposed to localize car plates 
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based on connected components analysis (CCA) while authors in [11] have proposed to use image 
binarization with the Otsu method. CCA may generate broken objects while the Otsu method is less accurate 
when both car body and carplate have the same color. 

Deep learning approach, or also known as deep structured learning or hierarchical learning [12], 
replaces the need for handcrafted features [13] as the process of determining and learning features is achieved 
by the algorithm through a large number of training images [14]. There are various deep learning 
architectures available, which include Convolutional Neural Network (CNN), Deep Belief Network (DBF) 
and Deep Stacking Network (DSN) [12]. Each architecture has its own application area, for instance, CNN 
is suitable for object recognition [15]. Previous literatures have applied deep learning algorithm to localize 
car plates [14,16-18] using CNN but have some limitations such as high false negative rate as low-resolution 
dataset has been used in the training process, assumption that there is only one carplate per image and they 
did not consider the complexity of Malaysian car plates, therefore, not suitable to be applied for Malaysian 
car plate localization system. Hence, a Malaysian car plate localization system using Region-based 
Convolutional Neural Network (R-CNN) is proposed. By using this deep learning approach, the dependency 
on handcrafted features can be eliminated and the issue of non-standardization of Malaysian carplate formats 
can be handled by the proposed system. The proposed system also aims to be able to localize all car plates 
from an input image if there are multiple car plates exists in the image. The rest of this paper is organized 
as follows. Section II describes the proposed algorithm, tools used and evaluation criteria used to evaluate 
the developed system. Results, analysis, and discussions are presented in Section III. Finally, the conclusion 
and recommendations for future work are discussed in Section IV. 


2. CAR PLATE LOCALIZATION WITH R-CNN 

The scope of this work is on Malaysian car plates with a black background and white characters 
for normal cars as well as a white background and black characters for taxies. The circumstances that are 
considered include multiple car plates (up to 3 car plates per image), special car plates (Putrajaya, 
Perodua, etc.), car plates with other logos, partially blocked car plates and carplates captured at an angle. 

2.1. Image acquisition and preprocessing 

The image acquisition process is carried out in order to collect the input dataset required to train 
and test the proposed system. In this work, a total of 1.157 Malaysian car plate images were acquired 
consisting of front and rear car images at 13 Megapixel resolutions in normal daylight condition. Each car 
plate image consists of a minimum of 1 car plate and up to a maximum of 3 car plates. The car plates 
in the input image take up approximately 20% to 30% of the whole image area. Images were also captured 
at different angles with respect to the camera with a camera angle of between 0° to 30°. Images are first 
converted to grayscale as the Malaysian car-plate consists of only black and white colors. Next, images 
are resized to 826x465 from the original resolution 4128x2322 - to reduce compute time while maintaining 
car plate image quality. After that, median filter is applied to the grayscale images in order to remove image 
noise [19] before applying image intensity adjustment to enhance the contrast. 

2.2. Training the AlexNet 

AlexNet is a pre-trained CNN network that has been created by Alex Krizhevsky et al. [20]. 
It is an 8 layer CNN model that has been trained with a total of 1.2 million training images obtained from 
ImageNet Large Scale Visual Recognition Challenge 2010 (ILSVRC-2010) with 1.000 object classes. 
R-CNN is a network that combines CNN network with region proposals [21], which has been proven to be 
successful for semantic segmentation [22] and object detection [23]. In order to perform transfer learning 
based on AlexNet, the last few layers of the AlexNet are replaced with the layers that are applicable 
for the current task. The last fully connected layer is changed to achieve two output classes, which are; car 
plate class and the background class. Transfer learning is then carried out in 3 iterations. In the first iteration, 
ground-truths are labeled by enclosing only the car plate numbers. The purpose of doing so is to allow 
the R-CNN network to leam the car plate characters. Figure 1(a) shows an example of labeled ground-truth 
for transfer learning in iteration 1. In this iteration, a total of 693 labeled training images (training set 1) have 
been used to train the R-CNN network. In the second iteration, transfer learning is done based on the trained 
network obtained from iteration 1 (itrl_net). Firstly, itrl_net has been used to perform carplate localization 
on another set of training dataset (training set 2), which is made up of 232 training images. Then, based 
on the localization results on training set 2, all the network’s false positive locations are labeled on training 
set 2 as background class while the true carplate locations have been labeled as carplate class. The labeled 
training set 2 is then used to train the R-CNN network in this second iteration of transfer learning. By doing 
so, it reduces the false positive detection of the network. Lastly, further training is carried out based 
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on the trained network obtained from iteration 2 (itr2_net). In this iteration, 300 training images 
from the training set 1 are relabeled so that both the car plate edges and the car plate characters are enclosed 
by the ground-truth bounding boxes. The purpose of doing so is to allow the network to leam that the car 
plate numbers are actually located on car plates. The relabeled training images are used to perform transfer 
learning on theitr2 net. 



Figure 1. (a-c) Examples of labeled ground truth images used during training in iterations 1, 2, 3 respectively 


2.3. Post-processing 

The post-processing stage is required to perform filtering on the generated bounding boxes 
by the R-CNN network during car plate localization testing phase. Firstly, confidence filtering stage 
is applied to the generated bounding boxes. The R-CNN network produces bounding boxes as well as 
the localization confidence factor for each of the classified car-plate region. The generated bounding boxes 
is filtered with a preset confidence threshold t , where only the bounding boxes with a confidence factor 
greater than the threshold value will be taken as the localization results. The threshold that has been chosen 
is 0.92, where it has been obtained experimentally by executing car plate localization on training set 2. 
If there are cases where none of the generated bounding boxes having a confidence factor greater than 0.92, 
the next highest confidence bounding box is selected. By applying this filtering stage, the false positive 
regions detected by the R-CNN network can be effectively eliminated. After that, all overlapping bounding 
boxes, where any bounding boxes with intersection over union (IoU) ratio greater than threshold r 
are merged. The threshold r that has been chosen is 0.15. As for the localization confidence factor 
of the resultant merged bounding box, it is calculated as the average confidence of all the overlapping 
bounding boxes. 

2.4. Evaluation method 

In order to evaluate the performance of the proposed system, several evaluation criteria are applied, 
namely the number of true positives (TP), false positives (FP), false negatives (FN), precision rate and recall 
rate [14]. These criteria have been used to quantify the accuracy of the car plate localization results 
for the proposed system. TP is referred to as the detected regions that contain real car plates. In this paper, 
the IoU ratio between the detected regions and the car plate ground-truth regions have been computed during 
the evaluation stage. The detected regions are considered as TP if the IoU ratio between the detected regions 
and the ground-truth is greater than 0.4. FP refers to the detected regions which do not contain any car plate. 
FP shows the number of regions that have been incorrectly detected. FN refers to the miss rate i.e. real car 
plates that have not been detected by the system. We define the precision rate as the total number of correctly 
detected car plates over the total number of detected regions. It has been used to provide an insight into 
the number of false alarms. A system with low precision rate will generate many bounding boxes with only 
a few of them containing correctly detected car plates. We also define recall as the total number of correctly 
detected car plates over the total number of ground-truth. This criterion gives information on the amount 
of ground-truth car plates that have been successfully detected. If the system has been unable to localize 
the majority of the ground-truth car plates, it will have a low recall rate. 

2.5. Creating a benchmark 

Two different approaches have been used to train another 2 R-CNN networks for car plate 
localization. The car plate localization results produced by the 2 R-CNN networks are then being used as 
a benchmark to compare with the results produced by AlexNet trained with transfer learning. In the first 
approach, R-CNN network has been trained from scratch by using the captured car plate images. In this 
approach, a 5 layer CNN model has been built, which consists of 3 convolutional layers and 2 fully 
connected layers. Similar to the transfer learning approach, the last fully connected layer is made up of two 
neurons in order to output 2 classes, which are the car plate class and also the background class. In this 
approach, the R-CNN network has been trained with a total of 693 training images, which is about 60% 
of the total captured images. In the second approach, transfer learning based on CIFAR-10 network has been 
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carried out. CIFAR-10 network is a pre-trained network that has been created by Alex Krizhevsky [24]. It is 
a 5 layer CNN model that has been trained with 50.000 of training images. The training images are made up 
of 10 classes, which are cat, dog, bird, horse, frog, deer, airplane, ship, automobile and truck. In this transfer 
learning approach, the last few layers of the pre-trained network have been replaced with the layers that suit 
for car plate localization. The first fully connected layer has been changed to 256 neurons instead 
of the original 64 neurons in order to allow the network to relate more learned features to the characteristics 
in different classes. As for the last fully connected layer, it has been changed to 2 neurons instead 
of 10 neurons, which translates to 2 output classes, car plate, and background. Similarly, in this approach, 
the R-CNN network has also been trained with 693 training images. 


3. RESULTS 

There are a total of 3 training approaches being applied in this work, namely, training a new 
network from scratch (approach 1), transfer learning based on CIFAR-10 network (approach 2), and transfer 
learning based on AlexNet (approach 3). Figure 2 shows the training graphs for all three approaches. 
Mini-batch is the number of training images to be propagated through the network during each iteration. 
One mini-batch is a subset of the total training images available. When all the mini-batches have completely 
passed through the network, it is referred to as one epoch. For the new network that is trained from scratch, 
the total training time is 4.2 hours. From the training graph in Figure 2(a), it has been observed that 
the mini-batch loss of the network saturates at 0.45 while the mini-batch accuracy saturates at 80%. 
This is because the amount of training images available in this project is considerably small and hence, 
the network accuracy cannot improve further with this numberof training images. 
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Figure 2. (a-c) Mini-batch loss and mini-batch accuracy training graphs in approaches 1,2, 3 respectively 
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As for the CIFAR-10 network, the training time is 0.52 hour. The training time for this approach 
is shorter because the network is running on an image input layer of smaller size. As a result, fewer 
multiply-accumulate operations have been performed and less amount of memory space was needed. 
From the training graph in Figure 2(b), it can be seen that the average mini-batch loss for the network is at 
around 0.15 while the average mini-batch accuracy is at around 95%, which outperforms the first approach. 
Figure 2(c) shows the training graph for transfer learning based on AlextNet in iteration 3. A total of 5.9 
hours was taken to train the R-CNN network in this iteration. The time taken to train this network 
is relatively long as the AlexNet is a big neural network that contains more layers and the image input layer 
size is relatively bigger. As a result, more multiply-accumulate operations are performed and a higher amount 
of memory space was used. Therefore, the total training time is longer. As illustrated in the training graph, 
the mini-batch loss approaches 0 while the mini-batch accuracy approaches 100%, which reflects that 
the trained network is able to achieve high accuracy. 

3.1. Car plate localization results 

In this work, a total of 232 test images were used to test the developed car plate localization system 
(around 20% of the total acquired images in this work). Figure 3 shows the AlexNet car plate localization 
results for car plates captured in different situations. The proposed approach is able to localize car-plates 
in different situations accurately, e.g. even for incomplete car-plate (missing 1 car-plate character), special 
Malaysian car-plate, multiple car-plates, occlusion and at an angle, as shown in Figure 3. Table 1 shows 
the results of the evaluation criteria set. The average precision and recall are calculated by averaging 
the precision and recall results for all test images. The “total detected regions” is the sum of bounding boxes 
produced by the car plate localization system. The “average IoU with TP” is the average amount 
of intersection between the detected TP with the manual labeled ground-truths. As for the “requires merging” 
criteria, it is the total number of bounding boxes that required merging to be performed. Lastly, the “average. 
TP confidence” is calculated by averaging the confidence factors for each TP boundingboxes. 



Table 1. Evaluation criteria results 



Proposed 

CIFAR 

AlexNet 


network 

10 

Iter. 1 

Iter. 3 

Average Precision (%) 

19.4 

82.22 

93.64 

95.19 

Average Recall (%) 

18.32 

86.28 

97.92 

97.84 

Total detected regions 

232 

295 

290 

289 

Total true positives 
(TP) 

45 

222 

258 

265 

Total false positives 
(FP) 

187 

73 

32 

24 

Total false negatives 
(FN) 

226 

54 

14 

11 

Average IoU with TP 

0.6061 

0.6588 

0.7525 

0.7346 

Requires merging 

0 

83 

32 

5 

Average TP 
confidence 

0.5648 

0.9856 

0.9950 

0.9964 


Figure 3. Car plate localization results 


Based on the results, it shows that the average precision and recall rate had increased from training 
new network from scratch (approach 1) to three-iteration transfer learning based on AlexNet (approach 3). 
The same trend has been observed in the totalnumberof TP. This indicates that the system performance had 
improved when moving from approach 1 to approach 3. On the other hand, the amount of FP and FNhad 
decreased from approach 1 to approach 3. This has shown the system has been able to produce better quality 
results with lesser noise. Besides, the average IoU between TP and ground-truths have been recorded to be 
at around 0.6 to 0.75, with the highest value achieved in approach 3. This has shown that the developed 
system in approach 3 has been able to produce boundingboxes that closely matched the labeled ground-truth. 
From the evaluation criteria results, it can be observed that the highest precision and recall rates have been 
achieved in approach 3, which are 95.19% and 97.84%. 

The same trend is observed in the totalnumberof TP. Out of the 289 detected regions in approach 3, 
there are 265 regions are true positives, which is 91.7% out of the total detected regions. This shows that 
the proposed method is able to localize car-plates with low false detections. The average IoU between TP 
and ground-truths have been recorded to be at 0.7346. In addition, it has been observed that the totalnumber 
of bounding boxes that required merging in approach 3 is very low. This means that the R-CNN network 
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is able to produce bounding boxes that completely encloses the car plate regions and less likely to require 
the merging stage. As for the average TP confidence, the trained network is been able to achieve very high 
confidence values for the detected TP, close to 1. 

3.2. Analysis 

Based on the evaluation results, it can be seen that the R-CNN network has produced accurate car 
plate localization results. Further training the AlexNet in iteration 3 was able to produce even better results as 
compared to the first iteration. This is because the network in the third iteration has now learned that 
car-plate numbers are found on car plates instead of just learning to recognize characters and numbers. 
As a result, this has assisted the car plate localization system to identify the region of car plates within 
the input images. The distribution graph of TP confidence factors for AlexNet in iteration 3 is shown in 
Figure 4 further proves the localization accuracy of the network. From the distribution results, it can be seen 
that most of the confidence factors for the TP are greater than 0.99. It reflects that the iteration 3 network 
is able to produce accurate localization results with high confidence factor on most of the test images. 
When comparing the AlextNet result to CIFAR-10 network result, it can be observed that the CIFAR-10 
network has relatively lower precision and recall rate. This is because AlexNet is a huge neural network that 
has deep layers and has been trained with a very large amount of training dataset (1.2 million images) in 
the previous training. As a result, AlexNet has a higher number of features being learned from the previous 
training as compared to CIFAR-10 network, which is being trained with 50.000 images. 

On the other hand, for the newly built network that has been trained from scratch, weaker results 
have been produced, where it has lower precision and recall rate. This is because the amount of training 
dataset available for training the R-CNN network is considerably small as compared to the amount of training 
dataset that the pre-trained networks have been trained on. As a result, without performing transfer learning, 
purely training the new network from scratch with the small amount of dataset has resulted in lower car plate 
localization accuracy. This shows that if the amount of training dataset is not sufficient, it is preferable 
to perform transfer learning instead of training a new network from scratch. Even though the AlexNet 
in iteration 3 has been able to produce accurate car plate localization results, there are still some false 
positives that have been detected as shown in Figure 5. Generally, the false positives detected by the system 
can be divided into two main types, which are the detection of car model names and the background regions. 
Some of the car model names are incorrectly detected as the car plate region as the car model name regions 
also contained characters and numbers, which are very similar to the car plate regions. On the other hand, 
background regions such as car grilles have also been incorrectly detected as car plate regions. 
This is because the regions of the car grilles contained a well-defined boundary, which is identical 
to carplates. 



Figure 4. Distribution graph of TP confidence factor for 
iteration 3 network 


Figure 5. Examples for false 
positives 


3.3. Comparison with existing work 

In order to benchmark the developed car plate localization system, comparison with previous work 
has been performed. Currently, there is no deep learning based work that has been done on Malaysian 
car-plates. Besides, none of the previous work is using R-CNN approach to perform car plate localization. 
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Therefore, a comparison has been done against the previous work that is not exactly the same but closest 
to the current work. Even though there are some limitations on this comparison as it is not one to one 
comparison, but it served as an indicator to provide a general view on the current standing of this work. 
The previous work done by Hui Li and Chunhua Shen [14] has been chosen as the benchmark work 
for comparison. Table 2 illustrates the comparison between [14] with the current work. The authors 
in the previous work have carried out an evaluation on their system by running a car plate localization 
process on Caltech dataset [25]. 

The Caltech dataset is made up of 126 US car plate images. All the images in the dataset are 
the images of cars captured from the rear and acquired in the parking lots of California Institute 
of Technology (Caltech). In order to compare the current work with the previous work, the proposed work 
was also tested on the Caltech dataset. Based on the precision and recall rates as shown in Table 2, even 
though the proposed method was purely trained based on Malaysian car plates, the system is still able to 
produce comparable results to the previous work when applied on the Caltech dataset. This shows that 
the R-CNN in the current work is sufficiently generalized to be able to work on different datasets. As for 
execution performance, the processing time for the previous work in [9] (NVIDIA Tesla K40c, 6GB) 
is 5 sec, while the proposed work’s processing times are 18 sec (NVIDIA GeForce 940MX, 4GB) and 4.5 sec 
(NVIDIA GeForce GTX1060, 6GB) respectively. Re-running the proposed work on NVIDIA GeForce 
GTX1060 is able to achieve similar processing time as compared to previous work. This has shown that 
the deep learning algorithm processing time is heavily dependent on the hardware resources available. 
Figure 6 shows some examples of localization results for the proposed method on the Caltech dataset. It can 
be observed that the proposed method has been able to accurately localize car-plates from the input images 
even though the images are having a US plate format. 


Hardware 


Network 

architecture 


Training 

dataset 


Evaluation on 
Caltech 
dataset (USA) 

Processing 

time 


Table 2. Comparison with previous work 


Reading car License Plates using Deep Convolutional 
Neural Networks and LTSM(H, Li andC. Shen, 2016) 
NVIDIA Tesla K40c with 6GB memory 
Cascaded CNN with false positive elimination by 
handcrafted features 

a. CNN1: 

- 4-layers 37-class CNN for text detection 

b. Refine bbox: 

- geometric constraints + vertical edge detection 
using Sobel operator 

c. CNN2: 

- 4-layers 2-class CNN for plate/non-plate detection 
CNN1: 

a. 1.3 8 x 10 5 character images + 9 x 10 5 non-character 

images 
CNN2: 

a. 3000 car plates from different countries 

b. 5 x 10 4 synthesizedlicenseplates 

c. 4 x 10 5 background images 


Precision : 97.56 
Recall : 95.24 

5 sec 


Current Project 

NVIDIA GeForce940MXwith 4GB memory 

RCNN with 3 training iterations 

a. Transfer learning from Alexnet 

b. Region proposal network: 

- Propose 2000 regions that most probably 
contain object 

c. CNN: 

- 8-layers 2-class Alexnet CNN forplate 
detection 

All training images are Malaysian car plates 
Training iteration 1: 

a. 689 training images with labelled groundtmth 
(training set 1) 

Training iteration 2: 

a. 232 training images with labelled groundtmth 
& background (training set 2) 

Training iteration 3: 

a. 300 training images with labelled groundtmth 
that includes plate edges (from training set 1) 

Precision : 84.26 
Recall : 94.44 

18 sec 

4.5 sec (rerunon NVIDIAGeForceGTX1060 
6GB) 



Figure 6. Carplate localization results on the Caltech dataset 
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4. CONCLUSION 

In this work, a Malaysian car plate localization system has been successfully developed using deep 
learning approach. The developed car plate localization system has been able to accurately localize the car 
plate location from input images by creating bounding boxes around the car plates. The images used in this 
work include images with multiple car plates, special character car plates, car plates with logos on it, partially 
blocked car plates and car plates captured at different angles. There are several potential improvements 
and modifications that can be done to further enhance the current car plate localization system. Firstly, a car 
plate number recognition stage can be added to the end of the car plate localization stage. By adding the car 
plate number recognition stage, the car plate numbers in the localized regions can be recognized 
and at the same time, any “background” type of false positives produced by the localization stage can be 
eliminated effectively. In addition, a large number of car model name images can be collected and be used 
to re-train the R-CNN network as a background class. This can effectively help the network to reject those 
text-based false positive regions especially the regions containing car model name. By doing so, the system 
can be made more robust as it will now be able to work in different weather and lighting conditions. 
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